Language teaching machine

ABSTRACT

A set of machines functions as a language teaching lab. Configured by suitable hardware, software, accessories, or any suitable combination thereof, such a language teaching lab accesses multiple sources and types of data, such as video streams, audio streams, thermal imaging data, eye tracker data, breath anemometer data, biosensor data, accelerometer data, depth sensor data, or any suitable combination thereof. From the accessed data, the language teaching lab detects that the user is pronouncing, for example, a word, a phrase, or a sentence, and then causes presentation of a reference pronunciation of that word, phrase, or sentence. Other apparatus, systems, and methods are also disclosed.

RELATED APPLICATION

This application claims the priority benefit of U.S. Provisional PatentApplication No. 62/907,921, titled “LANGUAGE TEACHING MACHINE” and filedSep. 30, 2019, which is incorporated herein by reference in itsentirety.

TECHNICAL FIELD

The subject matter disclosed herein generally relates to the technicalfield of special-purpose machines that facilitate teaching language,including software-configured computerized variants of suchspecial-purpose machines and improvements to such variants, and to thetechnologies by which such special-purpose machines become improvedcompared to other special-purpose machines that facilitate teachinglanguage. Specifically, the present disclosure addresses systems andmethods to facilitate teaching one or more language skills, such aspronunciation of words, to one or more users (e.g., students, children,or any suitable combination thereof).

BACKGROUND

A machine may be configured to teach language skills in the course ofinteracting with a user by presenting a graphical user interface (GUI)in which a language lesson is shown on a display screen and promptingthe user to read aloud a word caused by the machine to appear in the GUIthat shows the language lesson.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings.

FIG. 1 is a network diagram illustrating a network environment suitablefor operating a server machine (e.g., a language teaching servermachine), according to some example embodiments.

FIG. 2 is a block diagram illustrating components of a headset suitablefor use with the server machine, according to some example embodiments.

FIG. 3 is a block diagram illustrating components of a device suitablefor use with the server machine, according to some example embodiments.

FIG. 4 is a block diagram illustrating components of the server machine,according to some example embodiments.

FIG. 5-7 are flowcharts illustrating operations of the server machine inperforming a method of teaching a language skill (e.g., pronunciation ofa word), according to some example embodiments.

FIG. 8 is a block diagram illustrating components of a machine,according to some example embodiments, able to read instructions from amachine-readable medium and perform any one or more of the methodologiesdiscussed herein.

DETAILED DESCRIPTION

Example methods (e.g., algorithms) facilitate teaching language, andexample systems (e.g., special-purpose machines configured byspecial-purpose software) are configured to facilitate teachinglanguage. Examples merely typify possible variations. Unless explicitlystated otherwise, structures (e.g., structural components, such asmodules) are optional and may be combined or subdivided, and operations(e.g., in a procedure, algorithm, or other function) may vary insequence or be combined or subdivided. In the following description, forpurposes of explanation, numerous specific details are set forth toprovide a thorough understanding of various example embodiments. It willbe evident to one skilled in the art, however, that the present subjectmatter may be practiced without these specific details.

A set of one or more machines (e.g., computers or other devices) may beconfigured by suitable hardware and software to function collectively asa language teaching lab (e.g., a language teaching laboratory that isfully or partially wearable, portable, or otherwise mobile) for one ormore users. Such a language teaching lab may operate based on one ormore of various instructional principles, including, for example: thatoral comprehension precedes written comprehension; that hearing phonemesoccurs early (e.g., first) in learning a language; that auditoryisolation from environmental noise (e.g., via one or more earphones) mayfacilitate learning a language; that oral repetition allows a user tocompare a spoken phoneme to a memory of hearing that phoneme (e.g., in afeedback loop); and that mouth movements (e.g., mechanical motions bythe user's mouth) are correlated to oral articulation. Accordingly, theone or more machines of the language teaching lab may be configured toaccess multiple sources and types of data (e.g., one or more videostreams, an audio stream, thermal imaging data, eye tracker data, breathanemometer data, biosensor data, accelerometer data, depth sensor data,or any suitable combination thereof), detect from the accessed data thatthe user is pronouncing, for example, a word, a phrase, or a sentence,and then cause presentation of a reference (e.g., correct or standard)pronunciation of that word, phrase, or sentence. The presentation of thereference pronunciation may include playing audio of the referencepronunciation, playing video of an actor speaking the referencepronunciation, displaying an animated model of a mouth or face speakingthe reference pronunciation, displaying such an animated model texturemapped with an image of the user's own mouth or face speaking thereference pronunciation, or any suitable combination thereof.

FIG. 1 is a network diagram illustrating a network environment 100suitable for operating a server machine 110 (e.g., a language teachingserver machine), according to some example embodiments. The networkenvironment 100 includes the server machine 110, a database 115, aheadset 120, and a device 130, all communicatively coupled to each othervia a network 190. The server machine 110, with or without the database115, may form all or part of a cloud 118 (e.g., a geographicallydistributed set of multiple machines configured to function as a singleserver), which may form all or part of a network-based system 105 (e.g.,a cloud-based server system configured to provide one or morenetwork-based services to the headset 120, the device 130, or both). Theserver machine 110, the database 115, the headset 120, and the device130 may each be implemented in a special-purpose (e.g., specialized)computer system, in whole or in part, as described below with respect toFIG. 8.

Also shown in FIG. 1 is a user 132, who may be a person (e.g., a child,a student, a language learner, or any suitable combination thereof).More generally, the user 132 may be a human user (e.g., a human being),a machine user (e.g., a computer configured by a software program tointeract with the device 130), or any suitable combination thereof(e.g., a human assisted by a machine or a machine supervised by ahuman). The user 132 is associated with the device 130 and may be a userof the device 130. For example, the device 130 may be a desktopcomputer, a vehicle computer, a home media system (e.g., a home theatersystem or other home entertainment system), a tablet computer, anavigational device, a portable media device, a smart phone, or awearable device (e.g., a smart watch, smart glasses, smart clothing, orsmart jewelry) belonging to the user 132. Likewise, the user 132 isassociated with the headset 120 and may be a wearer of the headset 120.For example, the headset 120 may be worn on a head of the user 132 andoperated therefrom. In some example embodiments, the headset 120 and thedevice are communicatively coupled to each other (e.g., independently ofthe network 190), such as via a wired local or personal network, awireless networking connection, or any suitable combination thereof.

Any of the systems or machines (e g, databases, headsets, and devices)shown in FIG. 1 may be, include, or otherwise be implemented in aspecial-purpose (e.g., specialized or otherwise non-conventional andnon-generic) computer that has been modified to perform one or more ofthe functions described herein for that system or machine (e.g.,configured or programmed by special-purpose software, such as one ormore software modules of a special-purpose application, operatingsystem, firmware, middleware, or other software program). For example, aspecial-purpose computer system able to implement any one or more of themethodologies described herein is discussed below with respect to FIG.8, and such a special-purpose computer may accordingly be a means forperforming any one or more of the methodologies discussed herein. Withinthe technical field of such special-purpose computers, a special-purposecomputer that has been specially modified (e.g., configured byspecial-purpose software) by the structures discussed herein to performthe functions discussed herein is technically improved compared to otherspecial-purpose computers that lack the structures discussed herein orare otherwise unable to perform the functions discussed herein.Accordingly, a special-purpose machine configured according to thesystems and methods discussed herein provides an improvement to thetechnology of similar special-purpose machines.

As used herein, a “database” is a data storage resource and may storedata structured as a text file, a table, a spreadsheet, a relationaldatabase (e.g., an object-relational database), a triple store, ahierarchical data store, or any suitable combination thereof. Moreover,any two or more of the systems or machines illustrated in FIG. 1 may becombined into a single system or machine, and the functions describedherein for any single system or machine may be subdivided among multiplesystems or machines.

The network 190 may be any network that enables communication between oramong systems, machines, databases, and devices (e.g., between theserver machine 110 and the device 130). Accordingly, the network 190 maybe a wired network, a wireless network (e.g., a mobile or cellularnetwork), or any suitable combination thereof. The network 190 mayinclude one or more portions that constitute a private network, a publicnetwork (e.g., the Internet), or any suitable combination thereof.Accordingly, the network 190 may include one or more portions thatincorporate a local area network (LAN), a wide area network (WAN), theInternet, a mobile telephone network (e.g., a cellular network), a wiredtelephone network (e.g., a plain old telephone service (POTS) network),a wireless data network (e.g., a WiFi network or WiMax network), or anysuitable combination thereof. Any one or more portions of the network190 may communicate information via a transmission medium. As usedherein, “transmission medium” refers to any intangible (e.g.,transitory) medium that is capable of communicating (e.g., transmitting)instructions for execution by a machine (e.g., by one or more processorsof such a machine), and includes digital or analog communication signalsor other intangible media to facilitate communication of such software.

FIG. 2 is a block diagram illustrating components of the headset 120,according to some example embodiments. The headset 120 is shown asincluding an inwardly aimed camera 210 (e.g., pointed at, or otherwiseoriented to view, the mouth of the user 132 when wearing the headset120), an outwardly aimed camera 220 (e.g., pointed at, or otherwiseoriented to view, an area in front of the user 132 when wearing theheadset 120), a microphone 230 (e.g., pointed at or positioned near themouth of the user 132), and a speaker 240 (e.g., an audio speaker, suchas a headphone, earpiece, earbud, or any suitable combination thereof).Some example embodiments of the headset (e.g., for some speech therapyapplications) omit the outwardly aimed camera 220 or ignore its videostream.

The headset 120 is also shown as including a thermal imager 250, an eyetracker 251 (e.g., pointed at, or otherwise oriented to view, one orboth eyes of the user 132 when wearing the headset 120), an anemometer252 (e.g., a breath anemometer pointed at or positioned near the mouthof the user 132 when wearing the headset 120), and a set of one or morebiosensors 253 (e.g., positioned or otherwise configured to measureheartrate (HR), galvanic skin response (GSR), other skin conditions, anelectroencephalogram (EEG), other brain states, or any suitablecombination thereof, when the user 132 is wearing the headset 120).

In the example embodiments shown, the headset 120 further includes a setof one or more accelerometers 254 (e.g., positioned or otherwiseconfigured to measure movements, for example, of the mouth of the user132, the tongue of the user 132, the throat of the user 132, or anysuitable combination thereof, when wearing the headset 120), a musclestimulator 255 (e.g., a set of one or more neuromuscular electricalmuscle stimulators positioned or otherwise configured to stimulate oneor more muscles of the user 132 when wearing the headset 120), a laser256 (e.g., a low-power or otherwise child-safe laser pointer aimed at,or otherwise oriented to emit a laser beam toward, an area in front ofthe user 132 when wearing the headset 120), and a depth sensor 257(e.g., an infra-red or other type of depth sensor pointed at, orotherwise oriented to detect depth data in, an area in front of the user132 when wearing the headset 120). As shown in FIG. 2, the variousabove-described components of the headset 120, or any sub-groupingsthereof, are configured to communicate with each other (e.g., via a bus,shared memory, or a switch).

FIG. 3 is a block diagram illustrating components of the device 130,according to some example embodiments. The device 130 is shown asincluding a reading instruction module 310 (e.g., software-controlledhardware configured to interact with the user 132 in presenting one ormore reading tutorials), a speaking instruction module 320 (e.g.,software-controlled hardware configured to interact with the user 132 inpresenting one or more speech tutorials), an instructional game module330 (e.g., software-controlled hardware configured to interact with theuser 132 in presenting one or more instructional games), and a displayscreen 340 (e.g., a touchscreen or other display screen). As shown inFIG. 3, the various above-described components of the device 130, or anysub-groupings thereof, are configured to communicate with each other(e.g., via a bus, shared memory, or a switch).

As shown in FIG. 3, the reading instruction module 310, the speakinginstruction module 320, the instructional game module 330, or anycombination thereof, may form all or part of an app 300 (e.g., a mobileapp) that is stored (e.g., installed) on the device 130 (e.g.,responsive to or otherwise as a result of data being received by thedevice 130 via the network 190). Furthermore, one or more processors 399(e.g., hardware processors, digital processors, or any suitablecombination thereof) may be included (e.g., temporarily or permanently)in the app 300, the reading instruction module 310, the speakinginstruction module 320, the instructional game module 330, or anysuitable combination thereof.

FIG. 4 is a block diagram illustrating components of the server machine110, according to some example embodiments. The server machine 110 isshown as including a data access module 410, a data analysis module 420,and a pronunciation correction module 430, all configured to communicatewith each other (e.g., via a bus, shared memory, or a switch).

As shown in FIG. 4, the data access module 410, the data analysis module420, and the pronunciation correction module 430 may form all or part ofan app 400 (e.g., a server-side app) that is stored (e.g., installed) onthe server machine 110 (e.g., responsive to or otherwise as a result ofdata being received via the network 190). Furthermore, one or moreprocessors 499 (e.g., hardware processors, digital processors, or anysuitable combination thereof) may be included (e.g., temporarily orpermanently) in the app 400, the data access module 410, the dataanalysis module 420, the pronunciation correction module 430, or anysuitable combination thereof.

Any one or more of the components (e.g., modules) described herein maybe implemented using hardware alone (e.g., one or more of the processors399 or one or more of the processors 499, as appropriate) or acombination of hardware and software. For example, any componentdescribed herein may physically include an arrangement of one or more ofthe processors 399 or 499 (e.g., a subset of or among the processors 399or 499), as appropriate, configured to perform the operations describedherein for that component. As another example, any component describedherein may include software, hardware, or both, that configure anarrangement of one or more of the processors 399 or 499, as appropriate,to perform the operations described herein for that component.Accordingly, different components described herein may include andconfigure different arrangements of the processors 399 or 499 atdifferent points in time or a single arrangement of the processors 399or 499 at different points in time. Each component (e.g., module)described herein is an example of a means for performing the operationsdescribed herein for that component. Moreover, any two or morecomponents described herein may be combined into a single component, andthe functions described herein for a single component may be subdividedamong multiple components.

Furthermore, according to various example embodiments, componentsdescribed herein as being implemented within a single system or machine(e.g., a single device) may be distributed across multiple systems ormachines (e.g., multiple devices).

According to various example embodiments, the headset 120, the servermachine 110, the device 130, or any suitable combination thereof,functions as a mobile language learning lab for the user 132. Such alanguage learning lab provides instruction in one or more languageskills, practice exercises in those language skills, or both, to theuser 132. This language learning lab may be enhanced by providingpronunciation analysis, contextual reading, motor-muscle memory recallanalysis, auditory feedback, play-object identification, handwritingrecognition, gesture recognition, eye-tracking, biometric analysis, orany suitable combination thereof.

FIG. 5-7 are flowcharts illustrating operations of the server machine110 in performing a method 500 of teaching a language skill (e.g.,pronunciation of a word), according to some example embodiments.Operations in the method 500 may be performed by the server machine 110,the headset 120, the device 130, or any suitable combination thereof,using components (e.g., modules) described above with respect to FIGS.2-4, using one or more processors (e.g., microprocessors or otherhardware processors), or using any suitable combination thereof. Asshown in FIG. 5, the method 500 includes operations 510, 520, 530, and540.

In operation 510, the data access module 410 accesses two video streamsand an audio stream. Specifically, the accessed streams are or includeouter and inner video streams (e.g., an outer video stream and an innervideo stream) and an audio stream that are all provided by the headset120, which includes the outwardly aimed camera 220, the inwardly aimedcamera 210, and the microphone 230. The outwardly aimed camera 220 ofthe headset 120 has an outward field-of-view that extends away from awearer of the headset 120 (e.g., the user 132). The outwardly aimedcamera 220 generates the outer video stream based on (e.g., using orfrom) the outward field-of-view. The inwardly aimed camera 210 of theheadset 120 has an inward field-of-view that extends toward the wearerof the headset 120. The inwardly aimed camera 210 generates the innervideo stream based on (e.g., using or from) the inward field-of-view.The audio stream is generated by the microphone 230. In exampleembodiments where the headset omits the outwardly aimed camera 220 orignores the outer video stream (e.g., for some speech therapyapplications), the data access module 410 similarly omits or ignores theouter video stream.

In operation 520, the data analysis module 420 detects, based on thestreams accessed in operation 510, a co-occurrence of three things: avisual event in the outward field-of-view, a mouth gesture in the inwardfield-of-view, and a candidate pronunciation of a word. The visual eventis represented in the accessed outer video stream; the mouth gesture isrepresented in the accessed inner video stream; and the candidatepronunciation is represented in the accessed audio stream. In exampleembodiments where the headset omits the outwardly aimed camera 220 orignores the outer video stream (e.g., for some speech therapyapplications), the data analysis module 420 detects a co-occurrence oftwo things: the mouth gesture in the inward field-of-view, and thecandidate pronunciation of the word.

In operation 530, the pronunciation correction module 430 determines(e.g., by querying the database 115) that the visual event is correlatedby the database 115 to the word and correlated to a referencepronunciation of the word. This determination may be performed byoptically recognizing an appearance of the word (e.g., via opticalcharacter recognition) or an object (e.g., via shape recognition)associated with the word (e.g., by the database 115), within the outerfield-of-view. In example embodiments where the headset omits theoutwardly aimed camera 220 or ignores the outer video stream (e.g., forsome speech therapy applications), the pronunciation correction module430 determines (e.g., by querying the database 115), that the word iscorrelated to the reference pronunciation of the word.

In operation 540, the pronunciation correction module 430 causes (e.g.,triggers, controls, or commands, for example, via remote signaling) theheadset 120 to present the reference pronunciation of the word to thewearer (e.g., the user 132) in response to the detected co-occurrence ofthe visual event with the mouth gesture and with the candidatepronunciation of the word. In example embodiments where the headsetomits the outwardly aimed camera 220 or ignores the outer video stream(e.g., for some speech therapy applications), the pronunciationcorrection module 430 causes the headset 120 to present the referencepronunciation of the word in response to the detected co-occurrence ofthe mouth gesture with the candidate pronunciation of the word.

As shown in FIG. 6, in addition to any one or more of the operationspreviously described, the method 500 may include one or more ofoperations 620, 621, 622, 623, 640, 641, 650, 651, and 660. One or moreof operations 620, 621, 622, and 623 may be performed as part (e.g., aprecursor task, a subroutine, or a portion) of operation 520, in whichthe data analysis module 420 detects the three-way co-occurrence of thevisual event in the outward field-of-view with the mouth gesture in theinward field-of-view and with the candidate pronunciation in the audiostream.

In operation 620, as all or part of detecting the visual event, the dataanalysis module 420 detects a hand gesture or a touch made by a hand ofthe user 132 occurring on or near a visible word (e.g., displayed by thedevice 130 within the outward field-of-view). The relevant threshold fornearness may be a distance sufficient to distinguish from the visibleword from any other words visible in the outward field-of-view. Forexample, a detected hand gesture may point at the visible word orotherwise identify the visible word (e.g., to indicate a request forassistance in reading or pronouncing the visible word). As anotherexample, a detected touch on the visible word may similarly identify thevisible word (e.g., to indicate a request for assistance in reading orpronouncing the visible word). As a further example, the data analysismodule 420 may detect a hand of the user 132 handwriting or tracing thevisible word (e.g., to indicate a request for assistance in reading orpronouncing the visible word). As a still further example, the dataanalysis module 420 may detect a hand of the user 132 underlining orhighlighting the visible word (e.g., with a pencil, a marker, aflashlight, or other suitable writing or highlighting instrument).Accordingly, the visual event detected in the outward field-of-view mayinclude the hand of the user 132 handwriting the word, tracing the word,pointing at the word, touching the word, underlining the word,highlighting the word, or any suitable combination thereof. In responseto the detected hand gesture or touch, the visible word identified bythe hand gesture or touch may be treated as the word for which thecandidate pronunciation is represented in the audio stream generated bythe microphone 230.

In operation 621, as all or part of detecting the visual event, the dataanalysis module 420 detects that a hand of the user 132 is touching ormoving a physical object that represents a word, where the physicalobject is visible within the outward field-of-view. For example, thephysical object may be a model of an animal, such as a horse or a dog,or the physical object may be a toy or a block on which the word isprinted or otherwise displayed. The moving of the physical object may beor include rotation in space within the outward field-of-view,translation in space within the outward field-of-view, or both.Accordingly, the visual event detected in the outward field-of-view mayinclude the hand of the user 132 touching the physical object (e.g., thephysical model), grasping the physical object, moving the physicalobject, rotating the physical object, or any suitable combinationthereof. In response to the detected touching or moving of the physicalobject by the hand of the user 132, a word associated with the physicalobject (e.g., displayed by physical object or correlated with thephysical object by the database 115) may be treated as the word forwhich the candidate pronunciation is represented in the audio streamgenerated by the microphone 230.

In operation 622, as all or part of detecting the visual event, the dataanalysis module 420 detects a trigger gesture (e.g., a triggeringgesture) performed by a hand of the user 132 within the outwardfield-of-view. For example, the trigger gesture may be or include theperforming of a predetermined hand shape, a predetermined pose by one ormore fingers, a predetermined motion with the hand, or any suitablecombination thereof. In response to the detected trigger gesture, theword for which the candidate pronunciation is represented in the audiostream generated by the microphone 230 may be identified for requestingassistance in reading or pronouncing that word (e.g., requestingcorrection of the candidate pronunciation represented in the audiostream generated by the microphone 230).

In operation 623, as all or part of detecting the visual event, the dataanalysis module 420 detects a laser spot (e.g., a bright spot of laserlight) on a surface of a physical object visible in the outwardfield-of-view. For example, the headset 120 may include the outwardlyaimed laser 256 (e.g., a laser pointer or other laser emitter)configured to designate an object in the outward field-of-view bycausing a spot of laser light to appear on a surface of the object inthe outward field-of-view, and the outwardly aimed camera 220 of theheadset 120 may be configured to capture the spot of laser light and thedesignated object in the outward field-of-view. Accordingly, the visualevent detected in the outward field-of-view may include the spot oflaser light being caused to appear on the surface of the physical objectin the outward field-of-view. In response to the detected spot of laserlight appearing on the surface of the physical object, a word associatedwith the physical object (e.g., correlated with the physical object bythe database 115) may be treated as the word for which the candidatepronunciation is represented in the audio stream generated by themicrophone 230.

One or more of operations 640 and 641 may be performed as part (e.g., aprecursor task, a subroutine, or a portion) of operation 540, in whichthe pronunciation correction module 430 causes the headset 120 topresent the reference pronunciation of the word to the wearer (e.g., theuser 132) of the headset 120.

In operation 640, the pronunciation correction module 430 accesses a setof reference phonemes included in the reference pronunciation of theword. The set of reference phonemes may be stored in the database 115and accessed therefrom.

In operation 641, the pronunciation correction module 430 causes thespeaker 240 in the headset 120 to play the set of reference phonemesaccessed in operation 640. As discussed below with respect to FIG. 7,the speed at which the reference phonemes are played may vary and may bedetermined based on various factors. Returning to FIG. 6, one or more ofoperations 650, 651, and 660 may be performed at any point afteroperation 510, though the example embodiments illustrated indicate theseoperations being performed after operation 540.

In operation 650, the pronunciation correction module 430 accesses areference set of mouth shapes (e.g., images or models of mouth shapes)each configured to speak a corresponding reference phoneme included inthe reference pronunciation of the word. The reference set of mouthshapes may be stored in the database 115 and accessed therefrom. In someexample embodiments, the pronunciation correction module 430 alsoaccesses (e.g., from the database 115) an image of the user's own mouthor face for combining with (e.g., texture mapping onto, or morphingwith) the reference set of mouth shapes.

In operation 651, the pronunciation correction module 430 causes adisplay screen (e.g., the display screen 340 of the device 130) todisplay the accessed reference set of mouth shapes to the wearer of theheadset 120. In some example embodiments, the headset 120 and thedisplay screen (e.g., the display screen 340) are caused tocontemporaneously present the reference pronunciation (e.g., in audioform) of the word to the wearer of the headset 120 and display theaccessed reference set of mouth shapes (e.g., in visual form on thedisplay screen 340) to the wearer of the headset 120. In some exampleembodiments, the pronunciation correction module 430 combines (e.g.,texture maps or morphs) the reference set of mouth shapes with an imageof the user's own mouth or face and causes the display screen to presentthe resultant combination (e.g., contemporaneously with the referencepronunciation of the word).

In operation 660, the inwardly aimed camera 210 of the headset 120 hascaptured a mouth of the wearer of the headset 120 in the inwardfield-of-view, and the data analysis module 420 anonymizes the mouthgesture by cropping a portion of the inward field-of-view. The resultingcropped portion depicts the mouth gesture without depicting any eye ofthe wearer of the headset 120. This limited depiction may be helpful insituations where the privacy of the wearer (e.g., a young child) isimportant to maintain, such as where it would be beneficial to avoidcapturing facial features (e.g., one or both eyes) usable byface-recognition software. In such example embodiments, the anonymizedmouth gesture in the inward field-of-view is detected within the croppedportion of the inward field-of-view.

As shown in FIG. 7, in addition to any one or more of the operationspreviously described, the method 500 may include one or more ofoperations 710, 711, 712, 713, 714, 715, 716, 730, 750, 751, 760, and761. One or more of operations 710-716 may be performed prior tooperation 520, in which the data analysis module 420 detects thethree-way co-occurrence of the visual event in the outward field-of-viewwith the mouth gesture in the inward field-of-view and with thecandidate pronunciation in the audio stream. According to variousexample embodiments, the detection of the co-occurrence may be furtherbased on one or more factors (e.g., conditions) detectable by dataaccessed in one or more of operations 710-716.

In operation 710, the data access module 410 accesses a thermal image ofa hand of the wearer (e.g., the user 132) of the headset 120. Forexample, the outwardly aimed camera 220 may include a thermal imagingcomponent (e.g., the thermal imager 250 or similar) configured tocapture thermal images of objects within the outward field-of-view, orthe thermal imaging component (e.g., the thermal imager 250) may be aseparate component of the headset 120 and aimed to capture thermalimages of objects in the outward field-of-view. Accordingly, the visualevent in the outward field-of-view may be detected based on the thermalimage of a hand of the wearer of the headset.

In operation 711, the data access module 410 accesses a thermal image ofthe mouth (e.g., depicting the tongue or otherwise indicating the shapeof the tongue, the position of the tongue, or both) of the wearer (e.g.,the user 132) of the headset 120. For example, the inwardly aimed camera210 may include a thermal imaging component (e.g., the thermal imager250 or similar) configured to capture thermal images of objects withinthe inward field-of-view, or the thermal imaging component (e.g., thethermal imager 250) may be a separate component of the headset 120 andaimed to capture thermal images of objects in the inward field-of-view.Accordingly, the mouth gesture in the inward field-of-view may bedetected based on the thermal image of the mouth of the wearer of theheadset 120.

In operation 712, the data access module 410 accesses eye tracker datathat indicates an eye orientation of the wearer (e.g., the user 132) ofthe headset 120. For example, the headset 120 may further include aneye-tracking camera (e.g., the eye tracker 251) that may have a furtherfield-of-view and be configured to capture the orientation of one orboth eyes of the wearer in the further field-of-view. Accordingly, thedata analysis module 420 may determine the direction in which one orboth eyes of the wearer is looking based on the eye orientationindicated in the eye tracker data, and the visual event in the outwardfield-of-view may be detected based on the determined viewing directionin which the eye of the wearer is looking. For example, the determinedviewing direction may be a basis for detecting the visual event (e.g.,disambiguating or otherwise identifying the word for which the candidatepronunciation is represented in the audio stream generated by themicrophone 230).

In operation 713, the data access module 410 accesses anemometer datathat indicates one or more breath velocities of the wearer (e.g., theuser 132) of the headset 120. For example, the headset 120 may includean anemometer (e.g., the anemometer 252) configured to detect a breathvelocity of air entering or existing the mouth of the wearer of theheadset 120. Accordingly, the causing of the headset 120 to present thereference pronunciation of the word in operation 540 may be based on thedetected breath velocity of the wearer of the headset 120. For example,if the anemometer data indicates improper breathing in the candidatepronunciation of the word, the pronunciation correction module 430 maygenerate or access (e.g., from the database 115) an over-articulatedreference pronunciation of the word or otherwise obtain anover-articulated reference pronunciation of the word and then cause theover-articulated pronunciation to be presented (e.g., played) to thewearer of the headset 120.

In operation 714, the data access module 410 accesses biosensor datathat indicates one or more physiological conditions of the wearer (e.g.,the user 132) of the headset 120. The biosensor data may be accessedfrom one or more biosensors (e.g., the biosensors 253) included in theheadset 120 or communicatively coupled thereto. For example, one or moreof the biosensors 253 may be positioned within the headset 120,communicatively coupled thereto, or otherwise configured to measure theheartrate of the wearer, a galvanic skin response of the wearer, one ormore other skin conditions (e.g., temperature or elasticity) of thewearer, an electroencephalogram of the wearer, one or more brain statesof the wearer, or any suitable combination thereof. Accordingly, thepronunciation correction module 430 may determine a speed at which thereference pronunciation of the word is to be played (e.g., to thewearer) based on the information indicated in the accessed biosensordata.

In operation 715, the data access module 410 accesses accelerometer datathat indicates one or more muscle movements made by the wearer (e.g.,the user 132) of the headset 120. The accelerometer data may be accessedfrom one or more accelerometers (e.g., the accelerometers 254) includedin the headset 120 or communicatively coupled thereto. For example, oneor more of the accelerometers 254 may be positioned within the headset120, communicative the coupled thereto (e.g., included in a collar wornby the wearer of the headset 120), or otherwise configured to detect(e.g., by measurement) one or more muscle movements made duringperformance of the candidate pronunciation of the word by the wearer.Accordingly, the pronunciation correction module 430 may detect apattern of muscular movements based on the accessed accelerometer data,and the causing of the headset 120 to present the referencepronunciation of the word in operation may be based on the detectedpattern of muscular movements. For example, if the accelerometer dataindicates an improper pattern of muscular movements in performing acandidate pronunciation of the word, the pronunciation correction module430 may generate or access (e.g., from the database 115) anover-articulated reference pronunciation of the word or otherwise obtainan over-articulated reference pronunciation of the word and then causethe over-articulated pronunciation to be presented (e.g., played) to thewearer of the headset 120.

In operation 716, the data access module 410 accesses depth sensor datathat indicates a distance to an object in the outward field-of-view. Thedepth sensor data may be accessed from one or more depth sensors (e.g.,the depth sensor 257) included in the headset 120 or communicativelycoupled thereto. For example, the depth sensor 257 may be a stereoscopicinfrared depth sensor configured to detect distances to physical objectswithin the outward field-of-view. In some example situations, theoutwardly aimed camera 220 of the headset 120 is configured to capture ahand of the wearer (e.g., the user 132) of the headset 120 designating aphysical object in the outward field-of-view by touching the physicalobject at the distance detected by the depth sensor. Furthermore, thedesignated object may be correlated (e.g., by the database 115) to theword for which the candidate pronunciation is represented in the audiostream generated by the microphone 230, as well as correlated to thereference pronunciation of the word. Accordingly, the visual event inthe outward field-of-view may be or include the hand of the wearertouching the designated object in the outward field-of-view.

In operation 730, the pronunciation correction module 430 determines aspeed at which the reference pronunciation of the word is to be playedback. For example, the pronunciation correction module 430 may determinea playback speed (e.g., 1×, 0.9×, 1.2×, or 0.5×) for the referencepronunciation, and the playback speed may be determined based on resultsfrom one or more of operations 712-715. As an example, the data analysismodule 420 may detect that the wearer (e.g., the user 132) of theheadset 120 exhibited a state of stress, fatigue, frustration, or otherphysiologically detectable state in performing the candidatepronunciation of the word, and this detection may be based on the eyetracker data accessed in operation 712, the anemometer data accessed inoperation 713, the biosensor data accessed in operation 714, theaccelerometer data accessed in operation 715, or any suitablecombination thereof. Based on the detected state, the pronunciationcorrection module 430 may vary the playback speed of the referencepronunciation. Accordingly, the causing of the headset 120 to presentthe reference pronunciation of the word in operation 540 may be based onthe playback speed determined in operation 730, and the referencepronunciation consequently may be played at that playback speed.

In certain example embodiments, the pronunciation correction module 430,in performing operation 730, determines that the speed at which thereference pronunciation is to be played back is zero or a null value forthe speed. In particular, if the data analysis module 420 detects asufficiently high state of stress, fatigue, frustration, or otherphysiologically detectable state in performing the candidatepronunciation of the word (e.g., transgressing beyond a thresholdlevel), the pronunciation correction module 430 triggers a suggestion,recommendation, or other indication that the wearer (e.g., the user 132)take a rest break and resume performing candidate pronunciations ofwords after a period of recovery time. In such situations, the playbackof the reference pronunciation of the word may be omitted or replacedwith the triggered suggestion, recommendation, or other indication totake a rest break.

In operation 750, the pronunciation correction module 430 accesses areference pattern of muscular movements configured to speak thereference pronunciation of the word. For example, the reference patternof muscular movements may be stored in the database 115 and accessedtherefrom.

In operation 751, the pronunciation correction module 430 causes one ormore muscle stimulators (e.g., the muscle stimulator 255, which may beor include a neuromuscular electrical muscle stimulator) to stimulate aset of one or more muscles of the wearer (e.g., the user 132) of theheadset 120. As an example, the muscle stimulator 255 may be included inthe headset 120, communicatively coupled thereto (e.g., included in acollar that is communicatively coupled to the headset 120), or otherwiseconfigured to stimulate a set of muscles of the wearer. Accordingly, theset of muscles may be caused (e.g., via neuromuscular electricalstimulation (NMES)) to move in accordance with the reference pattern ofmuscular movements. In some example embodiments, this causation ofmuscle motion is performed in conjunction with one or more repetitionsof operation 540, in which the reference pronunciation of the word iscaused to be presented to the wearer of the headset 120 (e.g., to assistthe wearer in practicing how to articulate or otherwise perform thereference pronunciation of the word).

In operation 760, the pronunciation correction module 430 compares thecandidate pronunciation of the word to the reference pronunciation ofthe word. This comparison may be made on a phoneme-by-phoneme basis,such that a sequentially first phoneme included in the candidatepronunciation is compared to a counterpart first phoneme included in thereference pronunciation, a sequentially second phoneme included in thecandidate pronunciation is compared to a counterpart second phonemeincluded in the reference pronunciation, and so on.

In operation 761, based on the comparison performed in operation 760,the pronunciation correction module 430 recommends a pronunciationtutorial to the wearer (e.g., the user 132) of the headset 120. Forexample, the pronunciation correction module may cause presentation ofan indication (e.g., a dialog box, an alert, an audio message, our anysuitable combination thereof) that a pronunciation tutorial is beingrecommended to the wearer. In some example embodiment, the wearer canrespond with an acceptance of the recommendation, and in response to theacceptance of the recommendation, the pronunciation correction module430 may cause (e.g., command) the reading instruction module 310 toinitiate presentation of a reading tutorial that teaches one or morereading skills used in reading the word, cause the speaking instructionmodule 320 to initiate a presentation of a speech tutorial that teachesone or more speaking skills used in pronouncing the word, cause theinstructional game module 330 to initiate an instructional game thatteaches one or more of the reading or speaking skills, or cause anysuitable combination thereof.

According to various example embodiments, one or more of themethodologies described herein may facilitate teaching of language, orfrom another perspective, may facilitate learning of language. Moreover,one or more of the methodologies described herein may facilitateinstructing the user 132 in hearing, practicing, and correcting properpronunciations of phonemes, words, sentences, or any suitablecombination thereof. Hence, one or more of the methodologies describedherein may facilitate the teaching of language by facilitating alearner's learning of language, compared to capabilities of pre-existingsystems and methods.

When these effects are considered in aggregate, one or more of themethodologies described herein may obviate a need for certain efforts orresources that otherwise would be involved in language instruction orlanguage learning. Efforts expended by the user 132 in learning languageskills, by a language teacher in teaching such language skills, or both,may be reduced by use of (e.g., reliance upon) a special-purpose machinethat implements one or more of the methodologies described herein.Computing resources used by one or more systems or machines (e.g.,within the network environment 100) may similarly be reduced (e.g.,compared to systems or machines that lack the structures discussedherein or are otherwise unable to perform the functions discussedherein). Examples of such computing resources include processor cycles,network traffic, computational capacity, main memory usage, graphicsrendering capacity, graphics memory usage, data storage capacity, powerconsumption, and cooling capacity.

FIG. 8 is a block diagram illustrating components of a machine 800,according to some example embodiments, able to read instructions 824from a machine-readable medium 822 (e.g., a non-transitorymachine-readable medium, a machine-readable storage medium, acomputer-readable storage medium, or any suitable combination thereof)and perform any one or more of the methodologies discussed herein, inwhole or in part. Specifically, FIG. 8 shows the machine 800 in theexample form of a computer system (e.g., a computer) within which theinstructions 824 (e.g., software, a program, an application, an applet,an app, or other executable code) for causing the machine 800 to performany one or more of the methodologies discussed herein may be executed,in whole or in part.

In alternative embodiments, the machine 800 operates as a standalonedevice or may be communicatively coupled (e.g., networked) to othermachines. In a networked deployment, the machine 800 may operate in thecapacity of a server machine or a client machine in a server-clientnetwork environment, or as a peer machine in a distributed (e.g.,peer-to-peer) network environment. The machine 800 may be a servercomputer, a client computer, a personal computer (PC), a tabletcomputer, a laptop computer, a netbook, a cellular telephone, a smartphone, a set-top box (STB), a personal digital assistant (PDA), a webappliance, a network router, a network switch, a network bridge, or anymachine capable of executing the instructions 824, sequentially orotherwise, that specify actions to be taken by that machine. Further,while only a single machine is illustrated, the term “machine” shallalso be taken to include any collection of machines that individually orjointly execute the instructions 824 to perform all or part of any oneor more of the methodologies discussed herein.

The machine 800 includes a processor 802 (e.g., one or more centralprocessing units (CPUs), one or more graphics processing units (GPUs),one or more digital signal processors (DSPs), one or more applicationspecific integrated circuits (ASICs), one or more radio-frequencyintegrated circuits (RFICs), or any suitable combination thereof), amain memory 804, and a static memory 806, which are configured tocommunicate with each other via a bus 808. The processor 802 containssolid-state digital microcircuits (e.g., electronic, optical, or both)that are configurable, temporarily or permanently, by some or all of theinstructions 824 such that the processor 802 is configurable to performany one or more of the methodologies described herein, in whole or inpart. For example, a set of one or more microcircuits of the processor802 may be configurable to execute one or more modules (e.g., softwaremodules) described herein. In some example embodiments, the processor802 is a multicore CPU (e.g., a dual-core CPU, a quad-core CPU, an8-core CPU, or a 128-core CPU) within which each of multiple coresbehaves as a separate processor that is able to perform any one or moreof the methodologies discussed herein, in whole or in part. Although thebeneficial effects described herein may be provided by the machine 800with at least the processor 802, these same beneficial effects may beprovided by a different kind of machine that contains no processors(e.g., a purely mechanical system, a purely hydraulic system, or ahybrid mechanical-hydraulic system), if such a processor-less machine isconfigured to perform one or more of the methodologies described herein.

The machine 800 may further include a graphics display 810 (e.g., aplasma display panel (PDP), a light emitting diode (LED) display, aliquid crystal display (LCD), a projector, a cathode ray tube (CRT), orany other display capable of displaying graphics or video). The machine800 may also include an alphanumeric input device 812 (e.g., a keyboardor keypad), a pointer input device 814 (e.g., a mouse, a touchpad, atouchscreen, a trackball, a joystick, a stylus, a motion sensor, an eyetracking device, a data glove, or other pointing instrument), a datastorage 816, an audio generation device 818 (e.g., a sound card, anamplifier, a speaker, a headphone jack, or any suitable combinationthereof), and a network interface device 820.

The data storage 816 (e.g., a data storage device) includes themachine-readable medium 822 (e.g., a tangible and non-transitorymachine-readable storage medium) on which are stored the instructions824 embodying any one or more of the methodologies or functionsdescribed herein. The instructions 824 may also reside, completely or atleast partially, within the main memory 804, within the static memory806, within the processor 802 (e.g., within the processor's cachememory), or any suitable combination thereof, before or during executionthereof by the machine 800. Accordingly, the main memory 804, the staticmemory 806, and the processor 802 may be considered machine-readablemedia (e.g., tangible and non-transitory machine-readable media). Theinstructions 824 may be transmitted or received over the network 190 viathe network interface device 820. For example, the network interfacedevice 820 may communicate the instructions 824 using any one or moretransfer protocols (e.g., hypertext transfer protocol (HTTP)).

In some example embodiments, the machine 800 may be a portable computingdevice (e.g., a smart phone, a tablet computer, or a wearable device)and may have one or more additional input components 830 (e.g., sensorsor gauges). Examples of such input components 830 include an image inputcomponent (e.g., one or more cameras), an audio input component (e.g.,one or more microphones), a direction input component (e.g., a compass),a location input component (e.g., a global positioning system (GPS)receiver), an orientation component (e.g., a gyroscope), a motiondetection component (e.g., one or more accelerometers), an altitudedetection component (e.g., an altimeter), a temperature input component(e.g., a thermometer), and a gas detection component (e.g., a gassensor). Input data gathered by any one or more of these inputcomponents 830 may be accessible and available for use by any of themodules described herein (e.g., with suitable privacy notifications andprotections, such as opt-in consent or opt-out consent, implemented inaccordance with user preference, applicable regulations, or any suitablecombination thereof).

As used herein, the term “memory” refers to a machine-readable mediumable to store data temporarily or permanently and may be taken toinclude, but not be limited to, random-access memory (RAM), read-onlymemory (ROM), buffer memory, flash memory, and cache memory. While themachine-readable medium 822 is shown in an example embodiment to be asingle medium, the term “machine-readable medium” should be taken toinclude a single medium or multiple media (e.g., a centralized ordistributed database, or associated caches and servers) able to storeinstructions. The term “machine-readable medium” shall also be taken toinclude any medium, or combination of multiple media, that is capable ofcarrying (e.g., storing or communicating) the instructions 824 forexecution by the machine 800, such that the instructions 824, whenexecuted by one or more processors of the machine 800 (e.g., processor802), cause the machine 800 to perform any one or more of themethodologies described herein, in whole or in part. Accordingly, a“machine-readable medium” refers to a single storage apparatus ordevice, as well as cloud-based storage systems or storage networks thatinclude multiple storage apparatus or devices. The term“machine-readable medium” shall accordingly be taken to include, but notbe limited to, one or more tangible and non-transitory data repositories(e.g., data volumes) in the example form of a solid-state memory chip,an optical disc, a magnetic disc, or any suitable combination thereof.

A “non-transitory” machine-readable medium, as used herein, specificallyexcludes propagating signals per se. According to various exampleembodiments, the instructions 824 for execution by the machine 800 canbe communicated via a carrier medium (e.g., a machine-readable carriermedium). Examples of such a carrier medium include a non-transientcarrier medium (e.g., a non-transitory machine-readable storage medium,such as a solid-state memory that is physically movable from one placeto another place) and a transient carrier medium (e.g., a carrier waveor other propagating signal that communicates the instructions 824).

Certain example embodiments are described herein as including modules.Modules may constitute software modules (e.g., code stored or otherwiseembodied in a machine-readable medium or in a transmission medium),hardware modules, or any suitable combination thereof. A “hardwaremodule” is a tangible (e.g., non-transitory) physical component (e.g., aset of one or more processors) capable of performing certain operationsand may be configured or arranged in a certain physical manner. Invarious example embodiments, one or more computer systems or one or morehardware modules thereof may be configured by software (e.g., anapplication or portion thereof) as a hardware module that operates toperform operations described herein for that module.

In some example embodiments, a hardware module may be implementedmechanically, electronically, hydraulically, or any suitable combinationthereof. For example, a hardware module may include dedicated circuitryor logic that is permanently configured to perform certain operations. Ahardware module may be or include a special-purpose processor, such as afield programmable gate array (FPGA) or an ASIC. A hardware module mayalso include programmable logic or circuitry that is temporarilyconfigured by software to perform certain operations. As an example, ahardware module may include software encompassed within a CPU or otherprogrammable processor. It will be appreciated that the decision toimplement a hardware module mechanically, hydraulically, in dedicatedand permanently configured circuitry, or in temporarily configuredcircuitry (e.g., configured by software) may be driven by cost and timeconsiderations.

Accordingly, the phrase “hardware module” should be understood toencompass a tangible entity that may be physically constructed,permanently configured (e.g., hardwired), or temporarily configured(e.g., programmed) to operate in a certain manner or to perform certainoperations described herein. Furthermore, as used herein, the phrase“hardware-implemented module” refers to a hardware module. Consideringexample embodiments in which hardware modules are temporarily configured(e.g., programmed), each of the hardware modules need not be configuredor instantiated at any one instance in time. For example, where ahardware module includes a CPU configured by software to become aspecial-purpose processor, the CPU may be configured as respectivelydifferent special-purpose processors (e.g., each included in a differenthardware module) at different times. Software (e.g., a software module)may accordingly configure one or more processors, for example, to becomeor otherwise constitute a particular hardware module at one instance oftime and to become or otherwise constitute a different hardware moduleat a different instance of time.

Hardware modules can provide information to, and receive informationfrom, other hardware modules. Accordingly, the described hardwaremodules may be regarded as being communicatively coupled. Where multiplehardware modules exist contemporaneously, communications may be achievedthrough signal transmission (e.g., over circuits and buses) between oramong two or more of the hardware modules. In embodiments in whichmultiple hardware modules are configured or instantiated at differenttimes, communications between such hardware modules may be achieved, forexample, through the storage and retrieval of information in memorystructures to which the multiple hardware modules have access. Forexample, one hardware module may perform an operation and store theoutput of that operation in a memory (e.g., a memory device) to which itis communicatively coupled. A further hardware module may then, at alater time, access the memory to retrieve and process the stored output.Hardware modules may also initiate communications with input or outputdevices, and can operate on a resource (e.g., a collection ofinformation from a computing resource).

The various operations of example methods described herein may beperformed, at least partially, by one or more processors that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors may constitute processor-implemented modulesthat operate to perform one or more operations or functions describedherein. As used herein, “processor-implemented module” refers to ahardware module in which the hardware includes one or more processors.Accordingly, the operations described herein may be at least partiallyprocessor-implemented, hardware-implemented, or both, since a processoris an example of hardware, and at least some operations within any oneor more of the methods discussed herein may be performed by one or moreprocessor-implemented modules, hardware-implemented modules, or anysuitable combination thereof.

Moreover, such one or more processors may perform operations in a “cloudcomputing” environment or as a service (e.g., within a “software as aservice” (SaaS) implementation). For example, at least some operationswithin any one or more of the methods discussed herein may be performedby a group of computers (e.g., as examples of machines that includeprocessors), with these operations being accessible via a network (e.g.,the Internet) and via one or more appropriate interfaces (e.g., anapplication program interface (API)). The performance of certainoperations may be distributed among the one or more processors, whetherresiding only within a single machine or deployed across a number ofmachines. In some example embodiments, the one or more processors orhardware modules (e.g., processor-implemented modules) may be located ina single geographic location (e.g., within a home environment, an officeenvironment, or a server farm). In other example embodiments, the one ormore processors or hardware modules may be distributed across a numberof geographic locations.

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures and theirfunctionality presented as separate components and functions in exampleconfigurations may be implemented as a combined structure or componentwith combined functions. Similarly, structures and functionalitypresented as a single component may be implemented as separatecomponents and functions. These and other variations, modifications,additions, and improvements fall within the scope of the subject matterherein.

Some portions of the subject matter discussed herein may be presented interms of algorithms or symbolic representations of operations on datastored as bits or binary digital signals within a memory (e.g., acomputer memory or other machine memory). Such algorithms or symbolicrepresentations are examples of techniques used by those of ordinaryskill in the data processing arts to convey the substance of their workto others skilled in the art. As used herein, an “algorithm” is aself-consistent sequence of operations or similar processing leading toa desired result. In this context, algorithms and operations involvephysical manipulation of physical quantities. Typically, but notnecessarily, such quantities may take the form of electrical, magnetic,or optical signals capable of being stored, accessed, transferred,combined, compared, or otherwise manipulated by a machine. It isconvenient at times, principally for reasons of common usage, to referto such signals using words such as “data,” “content,” “bits,” “values,”“elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” orthe like. These words, however, are merely convenient labels and are tobe associated with appropriate physical quantities.

Unless specifically stated otherwise, discussions herein using wordssuch as “accessing,” “processing,” “detecting,” “computing,”“calculating,” “determining,” “generating,” “presenting,” “displaying,”or the like refer to actions or processes performable by a machine(e.g., a computer) that manipulates or transforms data represented asphysical (e.g., electronic, magnetic, or optical) quantities within oneor more memories (e.g., volatile memory, non-volatile memory, or anysuitable combination thereof), registers, or other machine componentsthat receive, store, transmit, or display information. Furthermore,unless specifically stated otherwise, the terms “a” or “an” are hereinused, as is common in patent documents, to include one or more than oneinstance. Finally, as used herein, the conjunction “or” refers to anon-exclusive “or,” unless specifically stated otherwise.

The following enumerated descriptions describe various examples ofmethods, machine-readable media, and systems (e.g., machines, devices,or other apparatus) discussed herein.

A first example provides a method comprising:

accessing, by one or more processors of a machine, outer and inner videostreams and an audio stream all provided by a headset that includes anoutwardly aimed camera, an inwardly aimed camera, and a microphone, theoutwardly aimed camera having an outward field-of-view extending awayfrom a wearer of the headset and generating the outer video stream fromthe outward field-of-view, the inwardly aimed camera having an inwardfield-of-view extending toward the wearer and generating the inner videostream from the inward field-of-view;detecting, by the one or more processors of the machine, a co-occurrenceof a visual event in the outward field-of-view with a mouth gesture inthe inward field-of-view and with a candidate pronunciation of a word,the visual event being represented in the outer video stream, the mouthgesture being represented in the inner video stream, the candidatepronunciation being represented in the audio stream;determining, by the one or more processors of the machine, that thevisual event is correlated by a database to the word and to a referencepronunciation of the word; andcausing, by the one or more processors of the machine, the headset topresent the reference pronunciation of the word to the wearer inresponse to the detected co-occurrence of the visual event with themouth gesture and with the candidate pronunciation of the word.

A second example provides a method according to the first example,wherein:

the causing of the headset to present the reference pronunciation of theword to the wearer of the headset includes:accessing a set of reference phonemes included in the referencepronunciation of the word; andcausing a speaker in the headset to play the set of reference phonemesincluded in the reference pronunciation.

A third example provides a method according to the first example or thesecond example, wherein:

the outwardly aimed camera of the headset captures the word in theoutward field-of-view; andin the detected co-occurrence, the visual event in the outwardfield-of-view includes a hand performing at least one of: handwritingthe word, tracing the word, pointing at the word, touching the word,underlining the word, or highlighting the word.

A fourth example provides a method according to any of the first throughthird examples, wherein:

the inwardly aimed camera of the headset captures a mouth of the wearerin the inward field-of-view; andin the detected co-occurrence, the mouth gesture in the inwardfield-of-view includes the mouth of the wearer sequentially making acandidate set of mouth shapes each configured to speak a correspondingcandidate phoneme included in the candidate pronunciation of the word.

A fifth example provides a method according to any of the first throughfourth examples, wherein:

the inwardly aimed camera of the headset captures a mouth of the wearerin the inward field-of-view;the method further comprises:anonymizing the mouth gesture by cropping a portion of the inwardfield-of-view, the cropped portion depicting the mouth gesture withoutdepicting any eye of the wearer of the headset; and wherein:in the detected co-occurrence, the anonymized mouth gesture in theinward field-of-view is detected within the cropped portion of theinward field-of-view.

A sixth example provides a method according to any of the first throughfifth examples, further comprising:

accessing a reference set of mouth shapes each configured to speak acorresponding reference phoneme included in the reference pronunciationof the word; andcausing a display screen to display the accessed reference set of mouthshapes to the wearer of the headset.

A seventh example provides a method according to the sixth example,wherein:

the headset and the display screen are caused to contemporaneouslypresent the reference pronunciation of the word to the wearer of theheadset and display the accessed reference set of mouth shapes to thewearer of the headset.

An eighth example provides a method according to any of the firstthrough seventh examples, wherein:

the causing of the display screen to display the accessed reference setof mouth shapes includes combining the reference set of mouth shapeswith an image that depicts a mouth of the wearer and causing the displayscreen to display a resultant combination of the image and the referenceset of mouth shapes.

A ninth example provides a method according to any of the first througheighth examples, wherein:

the outwardly aimed camera of the headset captures a physical model thatrepresents the word in the outward field-of-view; andin the detected co-occurrence, the visual event in the outwardfield-of-view includes a hand performing at least one of: touching thephysical model,grasping the physical model, moving the physical model, or rotating thephysical model.

A tenth example provides a method according to any of the first throughninth examples, wherein:

the outwardly aimed camera of the headset captures a hand of the wearerin the outward field-of-view; andin the detected co-occurrence, the visual event in the outwardfield-of-view includes the hand performing a trigger gesture thatindicates a correction request for correction of the candidatepronunciation.

An eleventh example provides a method according to the tenth example,wherein:

the causing of the headset to present the reference pronunciation of theword fulfills the request indicated by the trigger gesture performed bythe hand of the wearer.

A twelfth example provides a method according to any of the firstthrough eleventh examples, wherein:

the reference pronunciation presented in response to the detectedco-occurrence of the visual event with the mouth gesture and with thecandidate pronunciation of the word includes an over-articulatedpronunciation of the word.

A thirteenth example provides a method according to any of the firstthrough twelfth examples, wherein:

the outwardly aimed camera includes a thermal imaging component; andin the detected co-occurrence, the visual event in the outwardfield-of-view is detected based on a thermal image of a hand of thewearer of the headset.

A fourteenth example provides a method according to any of the firstthrough thirteenth examples, wherein:

the inwardly aimed camera includes a thermal imaging component; andin the detected co-occurrence, the mouth gesture in the inwardfield-of-view is detected based on a thermal image of a tongue of thewearer of the headset.

A fifteenth example provides a method according to any of the first tofourteenth examples, wherein:

the headset further includes an eye-tracking camera having a furtherfield-of-view and configured to capture an eye orientation of the wearerin the further field-of-view;the method further comprises:determining a direction (e.g., a viewing direction) in which the eye ofthe wearer is looking based on the eye orientation of the wearer; andwherein:in the detected co-occurrence, the visual event in the outwardfield-of-view is detected based on the determined direction in which theeye of the wearer is looking.

A sixteenth example provides a method according to any of the firstthrough fifteenth examples, wherein:

the headset further includes an anemometer configured to detect a breathvelocity of the wearer of the headset; andthe causing of the headset to present the reference pronunciation of theword is based on the detected breath velocity of the wearer of theheadset.

A seventeenth example provides a method according to any of the firstthrough sixteenth examples, wherein:

the headset further includes a biosensor configured to detect a stresslevel of the wearer of the headset; and the method further comprises:triggering presentation of an indication that the wearer of the headsettake a rest break based on the detected stress level of the wearer.

An eighteenth example provides a method according to any of the firstthrough seventeenth examples, wherein:

the headset is communicatively coupled to a biosensor configured todetect a skin condition of the wearer of the headset;the method further comprises:determining a playback speed at which the reference pronunciation is tobe presented to the wearer based on the skin condition detected by thebiosensor; and wherein:the causing of the headset to present the reference pronunciation of theword includes causing the reference pronunciation to be played at theplayback speed determined based on the skin condition.

A nineteenth example provides a method according to any of the firstthrough eighteenth examples, wherein:

the headset is communicatively coupled to a biosensor configured todetect a heartrate of the wearer of the headset;the method further comprises:determining a playback speed at which the reference pronunciation is tobe presented to the wearer based on the heartrate detected by thebiosensor; andwherein:the causing of the headset to present the reference pronunciation of theword includes causing the reference pronunciation to be played at theplayback speed determined based on the heartrate.

A twentieth example provides a method according to any of the firstthrough nineteenth examples, wherein:

the headset is communicatively coupled to a biosensor configured toproduce an electroencephalogram of the wearer of the headset;the method further comprises:determining a playback speed at which the reference pronunciation is tobe presented to the wearer based on the electroencephalogram produced bythe biosensor; and wherein:the causing of the headset to present the reference pronunciation of theword includes causing the reference pronunciation to be played at theplayback speed determined based on the electroencephalogram.

A twenty-first example provides a method according to any of the firstthrough twentieth examples, wherein:

the headset is communicatively coupled to a set of accelerometersincluded in a collar worn by the wearer of the headset;the method further comprises:detecting a pattern of muscular movements based on accelerometer datagenerated by the set of accelerometers in the collar; and wherein:the causing of the headset to present the reference pronunciation of theword is based on the detected pattern of muscular movements.

A twenty-second example provides a method according to the twenty-firstexample, wherein:

the headset is communicatively coupled to a set of neuromuscularelectrical muscle stimulators included in the collar worn by the wearerof the headset;the detected pattern of muscular movements is a candidate pattern ofmuscular movements made by the wearer in speaking the candidatepronunciation of the word; andthe method further comprises:accessing a reference pattern of muscular movements configured to speakthe reference pronunciation of the word; andcausing the neuromuscular electrical muscle stimulators in the collar tostimulate a set of muscles of the wearer based on the accessed referencepattern of muscular movements.

A twenty-third example provides a method according to any of the firstthrough twenty-second examples, wherein:

the headset includes an outwardly aimed laser emitter configured todesignate an object in the outward field-of-view by causing a spot oflaser light to appear on a surface of the object in the outwardfield-of-view;the outwardly aimed camera of the headset is configured to capture thespot of laser light and the designated object in the outwardfield-of-view;the designated object is correlated by the database to the word and tothe reference pronunciation of the word; andin the detected co-occurrence, the visual event in the outwardfield-of-view includes the spot of laser light being caused to appear onthe surface of the designated object in the outward field-of-view.

A twenty-fourth example provides a method according to any of the firstthrough twenty-third examples, wherein:

the headset includes a stereoscopic depth sensor configured to detect adistance to an object in the outward field-of-view;the outwardly aimed camera of the headset is configured to capture ahand of the wearer of the headset designating the object by touching theobject at the distance in the outward field-of-view;the designated object is correlated by the database to the word and tothe reference pronunciation of the word; andin the detected co-occurrence, the visual event in the outwardfield-of-view includes the hand of the wearer touching the designatedobject in the outward field-of-view.

A twenty-fifth example provides a method according to any of the firstto twenty-fourth examples, further comprising:

performing a comparison of candidate phonemes in candidate pronunciationof the word to reference phonemes in the reference pronunciation of theword; andrecommending a pronunciation tutorial to the wearer of the headset basedon the comparison of the candidate phonemes to the reference phonemes.

A twenty-sixth example provides a machine-readable medium (e.g., anon-transitory machine-readable storage medium) comprising instructionsthat, when executed by one or more processors of a machine, cause themachine to perform operations comprising:

accessing outward and inner video streams and an audio stream allprovided by a headset that includes an outwardly aimed camera, aninwardly aimed camera, and a microphone, the outwardly aimed camerahaving an outward field-of-view extending away from a wearer of theheadset and generating the outer video stream from the outwardfield-of-view, the inwardly aimed camera having an inward field-of-viewextending toward the wearer and generating the inner video stream fromthe inward field-of-view;detecting a co-occurrence of a visual event in the outward field-of-viewwith a mouth gesture in the inward field-of-view and with a candidatepronunciation of a word, the visual event being represented in the outervideo stream, the mouth gesture being represented in the inner videostream, the candidate pronunciation being represented in the audiostream;determining that the visual event is correlated by a database to theword and to a reference pronunciation of the word; andcausing the headset to present the reference pronunciation of the wordto the wearer in response to the detected co-occurrence of the visualevent with the mouth gesture and with the candidate pronunciation of theword.

A twenty-seventh example provides a system (e.g., a computer system)comprising:

one or more processors; anda memory storing instructions that, when executed by at least oneprocessor among the one or more processors, cause the system to performoperations comprising:accessing outward and inner video streams and an audio stream allprovided by a headset that includes an outwardly aimed camera, aninwardly aimed camera, and a microphone, the outwardly aimed camerahaving an outward field-of-view extending away from a wearer of theheadset and generating the outer video stream from the outwardfield-of-view, the inwardly aimed camera having an inward field-of-viewextending toward the wearer and generating the inner video stream fromthe inward field-of-view;detecting a co-occurrence of a visual event in the outward field-of-viewwith a mouth gesture in the inward field-of-view and with a candidatepronunciation of a word, the visual event being represented in the outervideo stream, the mouth gesture being represented in the inner videostream, the candidate pronunciation being represented in the audiostream;determining that the visual event is correlated by a database to theword and to a reference pronunciation of the word; andcausing the headset to present the reference pronunciation of the wordto the wearer in response to the detected co-occurrence of the visualevent with the mouth gesture and with the candidate pronunciation of theword.

A twenty-eighth example provides a system (e.g., a computer system)comprising:

one or more processors; anda memory storing instructions that, when executed by at least oneprocessor among the one or more processors, cause the system to performoperations comprising:accessing a video stream and an audio stream both provided by a headsetthat includes an inwardly aimed camera and a microphone, the inwardlyaimed camera having an inward field-of-view extending toward a wearer ofthe headset and generating the video stream from the inwardfield-of-view;detecting a co-occurrence of a mouth gesture in the inward field-of-viewwith a candidate pronunciation of a word, the mouth gesture beingrepresented in the video stream, the candidate pronunciation beingrepresented in the audio stream;

determining that the word is correlated by a database to a referencepronunciation of the word; and

causing the headset to present the reference pronunciation of the wordto the wearer in response to the detected co-occurrence of the mouthgesture with the candidate pronunciation of the word.

A twenty-ninth example provides a carrier medium carryingmachine-readable instructions for controlling a machine to carry out theoperations (e.g., method operations) performed in any one of thepreviously described examples.

What is claimed is:
 1. A method comprising: accessing, by one or moreprocessors of a machine, outer and inner video streams and an audiostream all provided by a headset that includes an outwardly aimedcamera, an inwardly aimed camera, and a microphone, the outwardly aimedcamera having an outward field-of-view extending away from a wearer ofthe headset and generating the outer video stream from the outwardfield-of-view, the inwardly aimed camera having an inward field-of-viewextending toward the wearer and generating the inner video stream fromthe inward field-of-view; detecting, by the one or more processors ofthe machine, a co-occurrence of a visual event in the outwardfield-of-view with a mouth gesture in the inward field-of-view and witha candidate pronunciation of a word, the visual event being representedin the outer video stream, the mouth gesture being represented in theinner video stream, the candidate pronunciation being represented in theaudio stream; determining, by the one or more processors of the machine,that the visual event is correlated by a database to the word and to areference pronunciation of the word; and causing, by the one or moreprocessors of the machine, the headset to present the referencepronunciation of the word to the wearer in response to the detectedco-occurrence of the visual event with the mouth gesture and with thecandidate pronunciation of the word.
 2. The method of claim 1, wherein:the causing of the headset to present the reference pronunciation of theword to the wearer of the headset includes: accessing a set of referencephonemes included in the reference pronunciation of the word; andcausing a speaker in the headset to play the set of reference phonemesincluded in the reference pronunciation.
 3. The method of claim 1,wherein: the outwardly aimed camera of the headset captures the word inthe outward field-of-view; and in the detected co-occurrence, the visualevent in the outward field-of-view includes a hand performing at leastone of: handwriting the word, tracing the word, pointing at the word,touching the word, underlining the word, or highlighting the word. 4.The method of claim 1, wherein: the inwardly aimed camera of the headsetcaptures a mouth of the wearer in the inward field-of-view; and in thedetected co-occurrence, the mouth gesture in the inward field-of-viewincludes the mouth of the wearer sequentially making a candidate set ofmouth shapes each configured to speak a corresponding candidate phonemeincluded in the candidate pronunciation of the word.
 5. The method ofclaim 1, wherein: the inwardly aimed camera of the headset captures amouth of the wearer in the inward field-of-view; the method furthercomprises: anonymizing the mouth gesture by cropping a portion of theinward field-of-view, the cropped portion depicting the mouth gesturewithout depicting any eye of the wearer of the headset; and wherein: inthe detected co-occurrence, the anonymized mouth gesture in the inwardfield-of-view is detected within the cropped portion of the inwardfield-of-view.
 6. The method of claim 1, further comprising: accessing areference set of mouth shapes each configured to speak a correspondingreference phoneme included in the reference pronunciation of the word;and causing a display screen to display the accessed reference set ofmouth shapes to the wearer of the headset.
 7. The method of claim 6,wherein: the headset and the display screen are caused tocontemporaneously present the reference pronunciation of the word to thewearer of the headset and display the accessed reference set of mouthshapes to the wearer of the headset.
 8. The method of claim 6, wherein:the causing of the display screen to display the accessed reference setof mouth shapes includes combining the reference set of mouth shapeswith an image that depicts a mouth of the wearer and causing the displayscreen to display a resultant combination of the image and the referenceset of mouth shapes.
 9. The method of claim 1, wherein: the outwardlyaimed camera of the headset captures a physical model that representsthe word in the outward field-of-view; and in the detectedco-occurrence, the visual event in the outward field-of-view includes ahand performing at least one of: touching the physical model, graspingthe physical model, moving the physical model, or rotating the physicalmodel.
 10. The method of claim 1, wherein: the outwardly aimed camera ofthe headset captures a hand of the wearer in the outward field-of-view;and in the detected co-occurrence, the visual event in the outwardfield-of-view includes the hand performing a trigger gesture thatindicates a correction request for correction of the candidatepronunciation.
 11. The method of claim 10, wherein: the causing of theheadset to present the reference pronunciation of the word fulfills therequest indicated by the trigger gesture performed by the hand of thewearer.
 12. The method of claim 1, wherein: the reference pronunciationpresented in response to the detected co-occurrence of the visual eventwith the mouth gesture and with the candidate pronunciation of the wordincludes an over-articulated pronunciation of the word.
 13. The methodof claim 1, wherein: the outwardly aimed camera includes a thermalimaging component; and in the detected co-occurrence, the visual eventin the outward field-of-view is detected based on a thermal image of ahand of the wearer of the headset.
 14. The method of claim 1, wherein:the inwardly aimed camera includes a thermal imaging component; and inthe detected co-occurrence, the mouth gesture in the inwardfield-of-view is detected based on a thermal image of a tongue of thewearer of the headset.
 15. The method of claim 1, wherein: the headsetfurther includes an eye-tracking camera having a further field-of-viewand configured to capture an eye orientation of the wearer in thefurther field-of-view; the method further comprises: determining adirection in which the eye of the wearer is looking based on the eyeorientation of the wearer; and wherein: in the detected co-occurrence,the visual event in the outward field-of-view is detected based on thedetermined direction in which the eye of the wearer is looking.
 16. Themethod of claim 1, wherein: the headset further includes an anemometerconfigured to detect a breath velocity of the wearer of the headset; andthe causing of the headset to present the reference pronunciation of theword is based on the detected breath velocity of the wearer of theheadset.
 17. The method of claim 1, wherein: the headset furtherincludes a biosensor configured to detect a stress level of the wearerof the headset; and the method further comprises: triggeringpresentation of an indication that the wearer of the headset take a restbreak based on the detected stress level of the wearer.
 18. The methodof claim 1, wherein: the headset is communicatively coupled to abiosensor configured to detect a skin condition of the wearer of theheadset; the method further comprises: determining a playback speed atwhich the reference pronunciation is to be presented to the wearer basedon the skin condition detected by the biosensor; and wherein: thecausing of the headset to present the reference pronunciation of theword includes causing the reference pronunciation to be played at theplayback speed determined based on the skin condition.
 19. The method ofclaim 1, wherein: the headset is communicatively coupled to a biosensorconfigured to detect a heartrate of the wearer of the headset; themethod further comprises: determining a playback speed at which thereference pronunciation is to be presented to the wearer based on theheartrate detected by the biosensor; and wherein: the causing of theheadset to present the reference pronunciation of the word includescausing the reference pronunciation to be played at the playback speeddetermined based on the heartrate.
 20. The method of claim 1, wherein:the headset is communicatively coupled to a biosensor configured toproduce an electroencephalogram of the wearer of the headset; the methodfurther comprises: determining a playback speed at which the referencepronunciation is to be presented to the wearer based on theelectroencephalogram produced by the biosensor; and wherein: the causingof the headset to present the reference pronunciation of the wordincludes causing the reference pronunciation to be played at theplayback speed determined based on the electroencephalogram.
 21. Themethod of claim 1, wherein: the headset is communicatively coupled to aset of accelerometers included in a collar worn by the wearer of theheadset; the method further comprises: detecting a pattern of muscularmovements based on accelerometer data generated by the set ofaccelerometers in the collar; and wherein: the causing of the headset topresent the reference pronunciation of the word is based on the detectedpattern of muscular movements.
 22. The method of claim 21, wherein: theheadset is communicatively coupled to a set of neuromuscular electricalmuscle stimulators included in the collar worn by the wearer of theheadset; the detected pattern of muscular movements is a candidatepattern of muscular movements made by the wearer in speaking thecandidate pronunciation of the word; and the method further comprises:accessing a reference pattern of muscular movements configured to speakthe reference pronunciation of the word; and causing the neuromuscularelectrical muscle stimulators in the collar to stimulate a set ofmuscles of the wearer based on the accessed reference pattern ofmuscular movements.
 23. The method of claim 1, wherein: the headsetincludes an outwardly aimed laser emitter configured to designate anobject in the outward field-of-view by causing a spot of laser light toappear on a surface of the object in the outward field-of-view; theoutwardly aimed camera of the headset is configured to capture the spotof laser light and the designated object in the outward field-of-view;the designated object is correlated by the database to the word and tothe reference pronunciation of the word; and in the detectedco-occurrence, the visual event in the outward field-of-view includesthe spot of laser light being caused to appear on the surface of thedesignated object in the outward field-of-view.
 24. The method of claim1, wherein: the headset includes a stereoscopic depth sensor configuredto detect a distance to an object in the outward field-of-view; theoutwardly aimed camera of the headset is configured to capture a hand ofthe wearer of the headset designating the object by touching the objectat the distance in the outward field-of-view; the designated object iscorrelated by the database to the word and to the referencepronunciation of the word; and in the detected co-occurrence, the visualevent in the outward field-of-view includes the hand of the wearertouching the designated object in the outward field-of-view.
 25. Themethod of claim 1, further comprising: performing a comparison ofcandidate phonemes in candidate pronunciation of the word to referencephonemes in the reference pronunciation of the word; and recommending apronunciation tutorial to the wearer of the headset based on thecomparison of the candidate phonemes to the reference phonemes.
 26. Amachine-readable medium comprising instructions that, when executed byone or more processors of a machine, cause the machine to performoperations comprising: accessing outward and inner video streams and anaudio stream all provided by a headset that includes an outwardly aimedcamera, an inwardly aimed camera, and a microphone, the outwardly aimedcamera having an outward field-of-view extending away from a wearer ofthe headset and generating the outer video stream from the outwardfield-of-view, the inwardly aimed camera having an inward field-of-viewextending toward the wearer and generating the inner video stream fromthe inward field-of-view; detecting a co-occurrence of a visual event inthe outward field-of-view with a mouth gesture in the inwardfield-of-view and with a candidate pronunciation of a word, the visualevent being represented in the outer video stream, the mouth gesturebeing represented in the inner video stream, the candidate pronunciationbeing represented in the audio stream; determining that the visual eventis correlated by a database to the word and to a reference pronunciationof the word; and causing the headset to present the referencepronunciation of the word to the wearer in response to the detectedco-occurrence of the visual event with the mouth gesture and with thecandidate pronunciation of the word.
 27. A system comprising: one ormore processors; and a memory storing instructions that, when executedby at least one processor among the one or more processors, cause thesystem to perform operations comprising: accessing outward and innervideo streams and an audio stream all provided by a headset thatincludes an outwardly aimed camera, an inwardly aimed camera, and amicrophone, the outwardly aimed camera having an outward field-of-viewextending away from a wearer of the headset and generating the outervideo stream from the outward field-of-view, the inwardly aimed camerahaving an inward field-of-view extending toward the wearer andgenerating the inner video stream from the inward field-of-view;detecting a co-occurrence of a visual event in the outward field-of-viewwith a mouth gesture in the inward field-of-view and with a candidatepronunciation of a word, the visual event being represented in the outervideo stream, the mouth gesture being represented in the inner videostream, the candidate pronunciation being represented in the audiostream; determining that the visual event is correlated by a database tothe word and to a reference pronunciation of the word; and causing theheadset to present the reference pronunciation of the word to the wearerin response to the detected co-occurrence of the visual event with themouth gesture and with the candidate pronunciation of the word.
 28. Asystem comprising: one or more processors; and a memory storinginstructions that, when executed by at least one processor among the oneor more processors, cause the system to perform operations comprising:accessing a video stream and an audio stream both provided by a headsetthat includes an inwardly aimed camera and a microphone, the inwardlyaimed camera having an inward field-of-view extending toward a wearer ofthe headset and generating the video stream from the inwardfield-of-view; detecting a co-occurrence of a mouth gesture in theinward field-of-view with a candidate pronunciation of a word, the mouthgesture being represented in the video stream, the candidatepronunciation being represented in the audio stream; determining thatthe word is correlated by a database to a reference pronunciation of theword; and causing the headset to present the reference pronunciation ofthe word to the wearer in response to the detected co-occurrence of themouth gesture with the candidate pronunciation of the word.